The ALICE Software Release Validation cluster
نویسندگان
چکیده
One of the most important steps of software lifecycle is Quality Assurance: this process comprehends both automatic tests and manual reviews, and all of them must pass successfully before the software is approved for production. Some tests, such as source code static analysis, are executed on a single dedicated service: in High Energy Physics, a full simulation and reconstruction chain on a distributed computing environment, backed with a sample “golden” dataset, is also necessary for the quality sign off. The ALICE experiment uses dedicated and virtualized computing infrastructures for the Release Validation in order not to taint the production environment (i.e. CVMFS and the Grid) with non-validated software and validation jobs: the ALICE Release Validation cluster is a disposable virtual cluster appliance based on CernVM and the Virtual Analysis Facility, capable of deploying on demand, and with a single command, a dedicated virtual HTCondor cluster with an automatically scalable number of virtual workers on any cloud supporting the standard EC2 interface. Input and output data are externally stored on EOS, and a dedicated CVMFS service is used to provide the software to be validated. We will show how the Release Validation Cluster deployment and disposal are completely transparent for the Release Manager, who simply triggers the validation from the ALICE build system’s web interface. CernVM 3, based entirely on CVMFS, permits to boot any snapshot of the operating system in time: we will show how this allows us to certify each ALICE software release for an exact CernVM snapshot, addressing the problem of Long Term Data Preservation by ensuring a consistent environment for software execution and data reprocessing in the future. 1 Overview Ensuring quality releases of the software framework of LHC experiments is a challenging task, given the continuous evolutions and the diversity of profiles of code authors. The ALICE experiment [1] adopts a validation procedure for its core framework that involves a full reconstruction and calibration from a reference raw dataset and comparison with the expected results, as well as performance benchmarks. The validation procedure is made of several independent batch jobs, whose results are subsequently merged. Even if their batch nature would make the Grid a suitable running environment, there are however at least two reasons why we prefer to use a dedicated cluster. First off, we cannot afford the validation procedure to run into site-specific problems, such as misconfigurations or, in general, non-controlled worker node deployments: any of such problems potentially yields false negatives making it more difficult to detect software regressions. In addition, running the validation procedure on the Grid would imply deploying release candidates on the large scale. ALICE uses CernVM-FS [2] as software deployment technology: any new published release candidate would make the central repository catalog tainted with dozens of rejected 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) IOP Publishing Journal of Physics: Conference Series 664 (2015) 022006 doi:10.1088/1742-6596/664/2/022006 Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1 releases, adding an additional load on the whole system and increasing the chances of a release candidate erroneously used for a user job. To this point it is clear that a non-chaotic infrastructure, whose configuration is thoroughly verified, should be used for running release validation jobs, the Grid not being a suitable candidate. In order to make a full release validation task entirely reproducible, the running environment must be versionable as well: we have created a portable and self-contained Release Validation Cluster, constituted of CernVM virtual machines and dynamically scalable on any cloud deployment supporting the EC2 interface [3]. In the following paragraphs we will give an overview of the various technologies contributing to the Release Validation Cluster (section 2): in particular, batch jobs run on the HTCondor [4] batch system, and elastiq [5] provides automatic scalability (section 2.1); release candidates are privately distributed using an embedded and isolated CernVM-FS server (section 2.2); the CernVM operating system [6] ensures environment consistency through snapshots (section 2.3). Secure worldwide access to both input and output data is achieved by means of the EOS [14] distributed filesystem (section 2.4). A brief overview of the validation jobs is also provided in section 3. The possibility to trivially run the Release Validation Cluster has been integrated with the central ALICE build server web interface (section 4): given its self-contained nature, the cluster can possibly be run manually on any cloud without the need to use the central build service. The release validation procedure has been optimized in order to be feasibly run for daily software releases (section 4.1). 2 Key technologies The ALICE Release Validation cluster is a specific application of the CernVM-based Elastic Clusters [7], this technology being in turn an evolution of the Virtual Analysis Facility [8][9]. CernVM Elastic Clusters were made in order to address the problem of exploiting cloud resources temporarily for running an isolated task that cannot run on a single virtual machine, but needs larger batch-like resources instead. A CernVM Elastic Cluster provides cloud tenants with an all-in-one environment where batch jobs can be submitted to a preconfigured HTCondor instance. The tenant only takes care of instantiating a single CernVM virtual machine, being the head node of the virtual cluster, and she can start submitting jobs right away: when properly configured, the head node itself instantiates virtual worker nodes, and disposes of them too when they are no longer needed. HTCondor jobs running on a CernVM Elastic Cluster do not need to be specially crafted: any HTCondor job using the “vanilla universe” can be executed on those virtual machines. Moreover, once the head node has been instantiated, Elastic Cluster users can submit jobs by being totally unaware of the underlying cloud infrastructure, as virtual machines are created and destroyed transparently. CernVM Elastic Clusters are self-contained, meaning that no external dependency is needed, either on the user’s laptop or on the public or private cloud deployment running the virtual machines: the only basic requirements are the ability to run the CernVM image (this includes accessing CernVM-FS either directly or through a proxy) and access to the EC2 interface of the cloud of your choice. No knowledge of configuring a HTCondor cluster are required: Elastic Clusters come preconfigured, and once their task is done they can be wiped away by simply deleting its virtual machines without leaving traces behind, optimizing the way virtual resources are exploited. ALICE is also using Elastic Clusters with success for the interactive use case in its PROOF-based Virtual Analysis Facilities [8][9], and even for running Grid jobs opportunistically on top of the new High-Level Trigger cluster [10], showing that CernVM Elastic Clusters are a simple and effective technology that can be easily adapted to different use cases. 2.1 HTCondor and elastiq The batch system that comes preconfigured with the Release Validation Cluster is HTCondor [4]. Among its several features, we are particularly interested in HTCondor’s ability to deal with dynamic resources: notably, when a new HTCondor worker node is created, it self-registers to its configured 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) IOP Publishing Journal of Physics: Conference Series 664 (2015) 022006 doi:10.1088/1742-6596/664/2/022006
منابع مشابه
A Novel Feature Subset Selection Algorithm for Software Defect Prediction
Feature subset selection is the process of choosing a subset of good features with respect to the target concept. A clustering based feature subset selection algorithm has been applied over software defect prediction data sets. Software defect prediction domain has been chosen due to the growing importance of maintaining high reliability and high quality for any software being developed. A soft...
متن کاملALICE experience with GEANT4
Since its release in 1999, the LHC experiments have been evaluating Geant4 in view of adopting it as a replacement for the obsolescent Geant3 transport MonteCarlo. The ALICE collaboration has decided to perform a detailed physics validation of elementary hadronic processes against experimental data already used in international benchmarks. In one test, proton interactions on different nuclear t...
متن کاملQuantitative Measures for Software Independent Verification and Validation
ii Foreword The primary objective of this project is the formulation of software independent verification and validation (IV&V) methodologies that can be readily applied to future software development projects both for and within the National Aeronautics and Space Administration (NASA). Inherent in the methodologies is the precise specification of software measurement and data collection proces...
متن کاملAAS 14-032 Distributed GN&C Flight Software Simulation for Spacecraft Cluster Flight
A spacecraft simulation environment was developed for testing distributed spacecraft flight software (FSW) designed for autonomous coordinated control of a spacecraft cluster. The Cluster Flight Application (CFA) FSW was developed by Emergent Space Technologies in support of the Defense Advanced Research Projects Agency (DARPA) System F6 Program. The CFA provides cluster flight guidance, naviga...
متن کاملValWorkBench: An open source Java library for cluster validation, with applications to microarray data analysis
The prediction of the number of clusters in a dataset, in particular microarrays, is a fundamental task in biological data analysis, usually performed via validation measures. Unfortunately, it has received very little attention and in fact there is a growing need for software tools/libraries dedicated to it. Here we present ValWorkBench, a software library consisting of eleven well known valid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015